Gözeti̇msi̇z Ayirici Di̇l Modeli̇ Eği̇ti̇mi̇ Unsupervised Discriminative Language Model Training
نویسندگان
چکیده
Özetçe —Bir otomatik konuşma tanıma sisteminin sonuç adımı olan ayırıcı dil modeli (ADM) eğitimi, eğitim örnekleri olarak kullandığı olası sözcük dizileri arasından en doğru olanının seçilmesini amaçlar. Gözetimli eğitimde konuşulan sözceye ait elle yazılandırılmış gerçek metin mevcuttur. Gözetimsiz eğitimde bu bilgi bulunmadığından örneklerin doğruluk dereceleri kesin olarak bilinemez. Bu çalışmada gerçek metin olmaksızın eğitim örneklerinin doğruluk derecelerinin kestirilebilmesine yönelik yöntemler araştırılmakta ve ADM eğitimi algılayıcı algoritmasının yapısal kestirim ve yeniden sıralama için uyarlanmış türevleriyle yapılmaktadır. Sonuçlar, gözetimsiz eğitimde gözetimli durumdaki kazancın yarısına varan bir iyileştirme elde edilebileceğini göstermektedir.
منابع مشابه
Minimum Imputed-Risk: Unsupervised Discriminative Training for Machine Translation
Discriminative training for machine translation has been well studied in the recent past. A limitation of the work to date is that it relies on the availability of high-quality in-domain bilingual text for supervised training. We present an unsupervised discriminative training framework to incorporate the usually plentiful target-language monolingual data by using a rough “reverse” translation ...
متن کاملUnsupervised training methods for discriminative language modeling
Discriminative language modeling (DLM) aims to choose the most accurate word sequence by reranking the alternatives output by the automatic speech recognizer (ASR). The conventional (supervised) way of training a DLM requires a large amount of acoustic recordings together with their manual reference transcriptions. These transcriptions are used to determine the target ranks of the ASR outputs, ...
متن کاملLightly supervised training for risk-based discriminative language models
We propose a lightly supervised training method for a discriminative language model (DLM) based on risk minimization criteria. In lightly supervised training, pseudo labels generated by automatic speech recognition (ASR) are used as references. However, as these labels usually include recognition errors, the discriminative models estimated from such faulty reference labels may degrade ASR perfo...
متن کاملPhrasal Cohort Based Unsupervised Discriminative Language Modeling
Simulated confusions enable the use of large text-only corpora for discriminative language modeling by hallucinating the likely recognition outputs that each (correct) sentence would be confused with. In [1], a novel approach was introduced to simulate confusions using phrasal cohorts derived directly from recognition output. However, the described approach relied on transcribed speech to deriv...
متن کاملUnsupervised Discriminative Training of PLDA for Domain Adaptation in Speaker Verification
This paper presents, for the first time, unsupervised discriminative training of probabilistic linear discriminant analysis (unsupervised DT-PLDA). While discriminative training avoids the problem of generative training based on probabilistic model assumptions that often do not agree with actual data, it has been difficult to apply it to unsupervised scenarios because it can fit data with almos...
متن کامل